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24 ABSTRACT 

25 Key biological properties such as high genetic diversity and high evolutionary 

26 rate enhance the potential of certain RNA viruses to adapt and emerge. 

27 Identifying viruses with these properties in their natural hosts could dramatically 

28 improve disease forecasting and surveillance. Recently, we discovered two novel 

29 members of the viral family Arteriviridae: simian hemorrhagic fever virus (SHFV)- 

30 krd and SHFV-krc2, infecting a single wild red colobus (Procolobus rufomitratus 

31 tephrosceles) in Kibale National Park, Uganda. Nearly nothing is known about 

32 the biological properties of SHFVs in nature, although the SHFV type strain, 

33 SHFV-LVR, has caused devastating outbreaks of viral hemorrhagic fever in 

34 captive macaques. Here we detected SHFV-krc1 and SHFV-krc2 in 40% and 

35 47% of 60 wild red colobus tested, respectively. We found viral loads in excess of 

36 1 0 6 -1 0 7 RNA copies per milliliter of blood plasma for each of these viruses. 

37 SHFV-krc1 and SHFV-krc2 also showed high genetic diversity at both the inter- 

38 and intra-host levels. Analyses of synonymous and non-synonymous nucleotide 

39 diversity across viral genomes revealed patterns suggestive of positive selection 

40 in SHFV open reading frames (ORF) 5 (SHFV-krc2 only) and 7 (SHFV-krc1 and 

41 SHFV-krc2). Thus, these viruses share several important properties with some of 

42 the most rapidly evolving, emergent RNA viruses. 
43 
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44 INTRODUCTION 

45 Certain RNA viruses have biological properties that make them particularly likely 

46 to emerge [1]. High genetic diversity, high evolutionary rates, and high viral loads 

47 are all thought to enhance the potential of some RNA viruses to adapt to 

48 changing environments by evading immune responses within hosts or enabling 

49 the invasion of new host populations [2,3]. It is widely accepted that identifying 

50 and characterizing such viruses in their natural hosts is important for disease 

51 monitoring and prevention [4-7]. For example, the origin of human 

52 immunodeficiency virus (HI V)-1 , group M (the strain responsible for the AIDS 

53 pandemic) from simian immunodeficiency viruses (SIVs) of wild chimpanzees in 

54 Central Africa [8] underscores the importance of "pandemic prevention," as well 

55 as the importance of non-human primates as reservoirs of potentially important 

56 viruses. 

57 The simian hemorrhagic fever viruses (SHFVs) are a poorly understood 

58 group of single stranded, positive-sense RNA viruses within the family 

59 Arteriviridae that have only recently been detected in wild primates [9,10]. Almost 

60 everything known about these viruses comes from the type strain of simian 

61 hemorrhagic fever virus (SHFV-LVR), which caused several "explosive" disease 

62 outbreaks in captive macaques {Macaca assamensis, M. arctoides, M. 

63 fasciularis, M. nemestrina, and M. mulatta) between 1964 and 1996. [11-14]. The 

64 lethality of SHFV infection in these Asian Old World monkeys (OWMs) suggested 

65 that macaques were highly susceptible to the virus, and were therefore unlikely 

66 to be natural hosts of SHFV-LVR. Further investigation revealed that monkeys of 
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67 several African OWM species - specifically patas monkeys (Erythrocebus patas), 

68 grivets {Chlorocebus aethiops), and Guinea baboons {Papio papio) - could 

69 persistently harbor SHFV-LVR in captivity without signs of disease [15]. Although 

70 this finding implicated African OWMs as the immediate source of SHFV-LVR in 

71 the captive outbreaks, neither SHFV-LVR nor any of its relatives had ever been 

72 identified in a wild animal until recently [9,16]. 

73 In 201 1, we discovered two highly divergent simian arteriviruses infecting 

74 a single wild red colobus {Procolobus rufomitratus tephrosceles) in Kibale 

75 National Park, Uganda (hereafter Kibale), which we named SHFV-krc1 and 

76 SHFV-krc2 [9]. Subsequently, we discovered additional, highly divergent simian 

77 arteriviruses in red-tailed guenons (Cercopithecus ascanius) from the same 

78 location [10]. Here we characterize SHFV-krc1 and SHFV-krc2 in 60 red colobus 

79 from Kibale. We show that these viruses infect a high proportion of red colobus in 

80 this population, replicate to high titers in infected monkeys, and have high genetic 

81 diversity, both within and among hosts. Our findings demonstrate that these 

82 viruses possess properties that are associated with the rapid evolutionary 

83 adaptability characteristic of many emerging RNA viruses. 



4 



Downloaded from http://biorxiv.org/on September 18, 2014 

84 MATERIALS AND METHODS 

85 Arterivirus genome organization and nomenclature. SHFV genomes contain 

86 a duplication of four open reading frames (ORFs) relative to the other viruses in 

87 the Arteriviridae family: porcine reproductive and respiratory syndrome virus 

88 (PRRSV), equine arteritis virus (EAV), and lactate dehydrogenase-elevating virus 

89 of mice (LDV). Previous publications regarding SHFV have treated the naming of 

90 these additional ORFs inconsistently. For clarity, we have adopted the 

91 nomenclature scheme presented in [17], and have included a schematic (Figure 

92 1) to maintain continuity with previous publications. 

93 Ethics statement. All animal use in this study followed the guidelines of 

94 the Weatherall Report on the use of non-human primates in research. Specific 

95 protocols were adopted to minimize suffering through anesthesia and other 

96 means during capture, immobilization, and sampling of the non-human primates. 

97 These included use of anesthesia during capture (Ketamine/Xylazine, 

98 administered intramuscularly with a variable-pressure pneumatic rifle), 

99 minimization of immobilization time and the use of an anesthetic reversal agent 

100 (Atipamezole) to reduce recovery time, and conservative limits on blood sample 

101 volumes (<1% body weight), as previously described [9]. Following sampling, all 

102 animals were released back to their social group without incident [18]. All 

103 research was approved by the Uganda Wildlife Authority (permit 

104 UWA/TDO/33/02), the Uganda National Council for Science and Technology 

105 (permit HS 364), and the University of Wisconsin Animal Care and Use 

106 Committee (protocol V01 409-0-02-09) prior to initiation of the study. 
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107 Study site and sample collection. Red colobus were sampled between 

108 2/5/2010 and 7/22/2012 in Kibale National Park, Uganda, a 795 km 2 semi- 

109 deciduous park in western Uganda (0°1 3'-0°41 ' N, 30°1 9'-30°32' E) known for its 

110 exceptional density of primates belonging to diverse species. Blood was 

111 separated using centrifugation and plasma was frozen immediately in liquid 

112 nitrogen for storage and transport to the United States. Samples were shipped in 

113 an lATA-approved dry shipper to the USA for further analysis at the Wisconsin 

114 National Primate Research Center in accordance with CITES permit #002290 

115 (Uganda). 

116 Molecular methods. Samples were processed for sequencing in a 

117 biosafety level 3 laboratory as described previously [9,10]. Briefly, for each 

118 animal, one ml of blood plasma was filtered (0.45um) and viral RNA was isolated 

119 using the Qiagen QIAamp MinElute virus spin kit (Qiagen, Hilden, Germany), 

120 omitting carrier RNA. DNase treatment was performed and cDNA synthesis was 

121 accomplished using random hexamers. Samples were fragmented and 

122 sequencing adaptors were added using the Nextera DNA Sample Preparation Kit 

123 (lllumina, San Diego, CA, USA). Deep sequencing was performed on the lllumina 

124 MiSeq (lllumina, San Diego, CA, USA). 

125 SHFV detection by quantitative RT-PCR. We developed a multiplex 

126 quantitative RT-PCR (qRT-PCR) assay to quantify plasma viral RNA of both 

127 SHFV-krc1 and SHFV-krc2 in each sample. Taqman assays were designed with 

128 amplification primers specific for either SHFV-krc1 (5'- 

129 ACACGGCTACCCTTACTCC-3' and 5'- TCGAGGTTAARCGGTTGAGA-3') or 
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130 SHFV-krc2 (5'-AACGCGCACCAACCACTATG-3' and 

131 5'GCGTGTTGAGGCCCTAATTTG-3'). The SHFV-krc1 probe (5'-Quasar 670- 

132 TTCTGGTCCTCTTGCGAAGGC-BHQ2-3') and SHFV-krc2 probe (5'-6-Fam- 

133 TTTGCTCAAGCCAATGACCTGCG-BHQ1 -3') were also virus-specific. The 

134 fluorophores used do not produce overlapping spectra, so no color compensation 

135 was required. Viral RNA was reverse transcribed and quantified using the 

136 Superscript III One-Step qRT-PCR system (Invitrogen, Carlsbad, CA) on 

137 a LightCycler 480 (Roche, Indianapolis, IN). Reverse transcription was carried 

138 out at 37°C for 15 min and then 50°C for 30 min followed by two minutes at 95°C 

139 and 50 cycles of amplification as follows: 95°C for 15 sec and 60°C for 1 min. 

140 The reaction mixture contained MgS0 4 at a final concentration of 3.0 mM, 150 ng 

141 random primers (Promega, Madison, Wl), with all 4 amplification primers at a 

142 concentration of 600 nM and both probes at a concentration of 100 nM. 

143 Genetic analyses. Sequence data were analyzed using CLC Genomics 

144 Workbench 5.5 (CLC bio, Aarhus, Denmark) and Geneious R5 (Biomatters, 

145 Auckland, New Zealand). Low quality (<Q25) and short reads (<100bp) were 

146 removed and the full genome sequences for each virus were acquired using de 

147 novo assembly. Due to the approximately 52% nucleotide sequence similarity 

148 between the genomes of SHFVkrd and SHFV-krc2, and the high frequency of 

149 co-infections in our animal cohort, we devised a method to minimize mapping of 

150 SHFV-krc1 reads to SHFV-krc2 (and vice versa) within a co-infected animal. 

151 Briefly, total reads from a co-infected animal were mapped to the SHFV-krc1 

152 consensus sequence generated from de novo assembly and "unmapped reads" 
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153 were collected, then mapped to the SHFV-krc2 consensus sequence obtained 

154 from de novo assembly. The resulting SHFV-krc2 consensus sequence was then 

155 used as the reference for mapping and collecting unmapped reads to map to the 

156 SHFV-krc1 consensus sequence generated from de novo assembly. This 

157 process was repeated until changes between the reference and the consensus 

158 sequences were not observed for either virus. Using this method, reads 

159 corresponding to SHFV-krc1 and SHFV-krc2 were reliably segregated in co- 

160 infected animals, with less than 0.2% of SHFV-specific reads mapping to both 

161 viruses. The average coverage per genome was 5,654x (range 1 18-19,1 15x) for 

162 SHFV-krc1 variants and 2,264 (range 94-6,61 3x) for SHFV-krc2 variants. For 

163 intra-host genetic analysis, sequencing reads were mapped to the corresponding 

164 consensus sequence for each variant. Single nucleotide polymorphism (SNP) 

165 reports were generated in Geneious, with a minimum coverage threshold of 100 

166 reads and a minimum frequency threshold of five percent. 

167 Evolutionary analyses. The synonymous nucleotide diversity (tts) and 

168 the non-synonymous nucleotide diversity (ttn) were estimated for each ORF 

169 individually from SNP reports generated by mapping sequencing reads to their 

170 corresponding consensus sequence. We estimated tt s = n s /L s and tt n = n n /L n , 

171 where n s is the mean number of pairwise synonymous differences; n n is the 

172 mean number of pairwise synonymous differences; l_ s is the number of 

173 synonymous sites; and l_ n is the number of nonsynymous sites. L s and l_ n were 

174 estimated by the method described in [19]. To compare viruses across different 

175 hosts, variant consensus sequences were aligned by the CLUSTAL algorithm in 



8 



Downloaded from http://biorxiv.org/on September 18, 2014 



176 MEGA 5.05 [20]. Estimating tt s and tt n separately for each ORF in each virus 

177 from co-infected animals, we used a factorial analysis of variance to test for main 

178 effects of the virus (SHFV-krc1 vs. SHFV-krc2) and the ORF, and for virus-by- 

179 ORF interactions. In the case of tts, there were highly significant main effects of 

180 virus (Fi,459 = 41.31; p < 0.001) and of ORF (Fi 3 , 4 59 = 14.07; p < 0.001), but 

181 there was not a significant virus-by-ORF interaction (F-13, 459 = 1 .35; n.s.). In the 

182 case of ttn, there were significant main effects of virus (F-i, 459 = 4.42; p = 0.036) 

183 and of ORF (F-13, 459 = 53.26; p < 0.001 ), and there was a highly significant virus- 

184 by-ORF interaction (F-13, 459 = 4.39; p < 0.001). Sliding window analysis involved 

185 estimating tt s and tt n in a sliding window of 9 codons, numbered according to the 

186 numbering in the sequence alignment of the first codon in the window. 

187 Layercake visualization. We developed a specialized visualization tool 

188 called LayerCake for this dataset. This tool allows visual comparison of variants 

189 for multiple individuals simultaneously, encoding sequences as bands of color, 

190 with redder sections of the band corresponding to regions with a higher 

191 proportion of polymorphic reads. Downloadable versions of the krd and krc2 

192 datasets are available, along with a generalized tutorial for interpreting 

193 LayerCake displays, at http://graphics.cs.wisc.edu/Vis/LayerCake/ . 
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194 RESULTS 

195 Sample collection and infection frequency of SHFV-krc1 and SHFV-krc2 in 

196 Kibale red colobus. Blood samples were collected from 60 adult red colobus 

197 residing in the Kanyawara area of Kibale over a period of 2.5 years. These 

198 animals represent approximately half of a defined social group, but comprise a 

199 relatively small proportion of the total red colobus population in Kibale [21]. All 

200 animals appeared normal and healthy at the time of sampling. RNA was isolated 

201 from the blood plasma of each animal and "unbiased" deep sequencing was 

202 performed on an lllumina MiSeq machine as previously described [9,10]. De 

203 novo assembly and iterative mapping of sequencing reads yielded 52 near full- 

204 length SHFV consensus sequences (GenBank accession numbers KC787607- 

205 KC787658). Twenty-four animals (40.0%) were infected with SHFV-krc1 , and 28 

206 animals (46.7%) were infected with SHFV-krc2. Twenty-one animals (35.0%) 

207 were co-infected with both SHFV-krc1 and SHFV-krc2 (Figure 2). 

208 Viral loads of SHFV-krc1 and SHFV-krc2 in the Kibale red colobus. To 

209 estimate the viral load of SHFV-krc1 and SHFV-krc2 in infected red colobus, a 

210 strain-specific qRT-PCR assay was designed to amplify highly conserved regions 

211 in ORF7 of the SHFV-krc1 and SHFV-krc2 genomes. This assay was used to 

212 assess the viral burden in cell-free plasma for each animal found to be positive 

213 by deep sequencing. SHFV-krc1 viremia was consistently high, averaging 

214 5. 1 x1 0 7 vRNA copies/ml plasma, (range: 1. 5x1 0 6 -1. 9x1 0 8 copies/ml plasma) 

215 (Figure 3A). SHFV-krc2 loads were more varied (range: 3.4x10 4 -4.1x10 7 

216 copies/ml) and significantly lower than SHFV-krc1 with an average plasma titer of 
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217 7.5x1 0 6 vRNA copies/ml plasma (p = 0.0001, two-tailed t-test). Although 

218 instances of mono-infection were scarce relative to co-infection, mono/co- 

219 infection status did not impact the load of either virus to a statistically significant 

220 extent (mono- vs. co-infected: p = 0.063 for SHFV-krc1 , p = 0.089 for SHFV-krc2, 

221 two-tailed t-test, Figure 3B,C). 

222 Consensus-level genetic diversity among SHFV-krc1 and SHFV-krc2 

223 variants. To quantify the genetic diversity of SHFV-krc1 and SHFV-krc2 within 

224 the red colobus population, we examined similarity among the 24 SHFV-krc1 and 

225 28 SHFV-krc2 variants by comparing the nucleotide consensus sequences of 

226 each viral variant (Figure 4). These consensus sequences represent the majority 

227 nucleotide base present at each position of the genome for the viral population 

228 within each host. Because RNA viruses often exist within a host as a highly 

229 heterogeneous population (i.e. "mutant swarm"), the consensus sequence may 

230 not actually be present in the within-host viral population [22,23]. Nevertheless, 

231 the construction of consensus sequences allowed us to compare the average 

232 viral population of each variant (i.e. inter-host diversity). For SHFV-krc1, percent 

233 pairwise nucleotide identity between variants ranged from 86.9%-99.5%. A 

234 highly related core group (SHFV-krc1 from red colobus 06, 28, 33, 22, 25, 34, 54, 

235 31 , 40, 05, 56, 44, 08, 45, 01 ) with pairwise nucleotide identities ranging from 

236 94.5%-99.3% comprised 63% of the variants (Figure 4A). A distinct second 

237 group (SHFV-krc1 from red colobus 09, 30, 10, 18, 07, 60) with a slightly wider 

238 range of similarity (92.0%-99.5% pairwise nucleotide identity) made up an 

239 additional 21 % of variants. A similar pattern, with two distinct groups, was 
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240 observed for SHFV-krc2 variants (range: 89.67%-99.48% pairwise nucleotide 

241 identity, Figure 4B). However, patterns of SHFV-krc1 genetic similarity among 

242 hosts were different from patterns of SHFV-krc2 similarity among hosts, (data not 

243 shown). Interestingly, SHFV-krc1 variants from red colobus 13 and 61 were 

244 highly dissimilar to all other SHFV-krc1 variants identified, with pairwise 

245 nucleotide identities ranging from 86.8%-88.7%. 

246 Within-host genetic diversity of SHFV-krc1 and SHFV-krc2. To 

247 examine the genetic diversity of SHFV-krc1 and SHFV-krc2 within individual 

248 monkeys, we calculated the non-synonymous and synonymous nucleotide 

249 diversity, tt n and tt s respectively, for each within-host viral population using deep 

250 sequencing reads from each viral variant. Comparing tt n and tt s from specific 

251 regions of a viral genome can reveal the mode of natural selection acting on a 

252 region. For example, ttn < tts is indicative of negative selection acting to remove 

253 deleterious protein-coding mutations, while ttn > tts is suggestive of positive 

254 selection acting to drive beneficial protein-coding mutations to fixation. We found 

255 that, overall, negative selection acting against deleterious non-synonymous 

256 mutations predominated for both SHFV-krc1 and SHFV-krc2. In SHFV-krc1, tt s 

257 exceeded ttn by a ratio of over 6:1 , whereas in SHFV-krc2, tts exceeded ttn by a 

258 ratio of nearly 5:1 . Both tt s and tt n were significantly greater in SHFV-krc1 than in 

259 SHFV-krc2 (p = 0.002 and p = 0.021 , paired t-test), indicating greater overall 

260 nucleotide diversity in SHFV-krc1 than in SHFV-krc2 (Figure 5). A positive 

261 correlation between viral load and both tts and ttn was observed. However, mean 

262 tt s and tt n did not differ significantly between co-infected monkeys and those 
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263 infected with only SHFV-krc1 or SHFV-krc2 (data not shown). 

264 The organization of ORFs in the genomes of SHFV-krc1 and SHFV-krc2 

265 was the same as described previously (Figure 1) [9,10,17], so we used a factorial 

266 analysis of variance approach to investigate tts and ttn in ORFs in both viruses. 

267 In general, 3'-proximal ORFs displayed more non-synonymous diversity than 5'- 

268 proximal ORFs, suggesting that the proteins encoded by 5'-proximal ORFs may 

269 be more functionally constrained than those encoded by 3'-proximal ORFs. 

270 However, the extent to which underlying RNA structures may have affected this 

271 analysis is unknown [24-26]. ORF5 showed the highest mean tt n in SHFV-krc1 

272 and among the highest in SHFV-krc2 (Figure 6). In the case of both SHFV-krc1 

273 and SHFV-krc2, a sliding window plot of 9 codons revealed peaks of tt n 

274 corresponding to codons 1-46 and 64-100 of ORF5 (Figure 7A,B). The latter 

275 peak (codons 64-100) also involved high tts, suggesting a mutational hotspot. 

276 Interestingly, tt n was substantially higher in ORF3 of SHFV-krc2 than of SHFV- 

277 krd (Figure 6). Sliding window analysis revealed a substantial peak of tt n 

278 between codons 141-173 of SHFV-krc2 ORF3 (Figure 7C) that greatly exceeded 

279 tts, suggesting strong positive selection in this region of SHFV-krc2. This peak of 

280 ttn corresponded to a region of variable length rich in acidic residues. An 

281 analogous peak of tt n in ORF3 of SHFV-krc1 was not found, although a unique 

282 peak of ttn was identified between codons 50 and 68 (Figure 7D). Of note, a high 

283 degree of variability in predicted A/-glycosylation [27] was associated with each 

284 instance of elevated tt n in ORF3 and ORF5 for both SHFV-krc1 and SHFV-krc2. 

285 For peaks of tt n found in regions of ORF3 and ORF5 that shared sequence with 
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286 an overlapping alternative ORF, sliding window plot analysis in the alternative 

287 ORFs revealed peaks of tts demonstrating that observed elevations in ttn were 

288 ORF-specific, as expected [28,29] (data not shown). 

289 Unique patterns of inter- and intra-host variation can be visualized on a 

290 genome-wide scale for all SHFV-krc1 and SHFV-krc2 variants using our custom- 

291 built LayerCake software: http://graphics.cs.wisc.edu/Vis/LayerCake/ . 
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292 DISCUSSION 

293 This study provides the first systematic analysis of SHFV genetic diversity in a 

294 population of wild non-human primates. Our findings show that SHFV-krc1 and 

295 SHFV-krc2 have a high frequency of infection in the red colobus population of 

296 Kibale, and that these viruses achieve high titers in the blood of infected 

297 monkeys. Our study also details, for the first time, the genetic diversity of SHFV- 

298 krd and SHFV-krc2 both within and among infected hosts. We draw particular 

299 attention to the signatures of natural selection identified throughout the genomes 

300 of these viruses, with an emphasis on signatures of positive selection identified in 

301 ORFs 3 and 5. 

302 To date, primates from only two species - the red colobus and red-tailed 

303 guenon - have been found to harbor simian arteriviruses in the wild [9,10]. 

304 However, the origins and host-ranges of these viruses are not clear. Our findings 

305 support the hypothesis that simian arteriviruses are endemic to African OWMs 

306 and cause little to no clinical disease in these hosts. However, when introduced 

307 into Asian OWMs, these viruses may be lethal, as exemplified by SHFV-LVR 

308 [13,30]. This pattern of pathogenesis is similar to SIV [31] and, like SIV, the 

309 simian arteriviruses appear to be well host-adapted, which suggests an ancient 

310 evolutionary relationship between these viruses and their African OWM hosts. 

311 This is in contrast to the arterivirus PRRSV, which emerged suddenly in pig 

312 populations across the globe in the 1980's [32]. Taken together, this implies that 

313 the prevalence and diversity of the Arteriviridae, including the simian arterivirus 

314 group, may be greater than currently appreciated. 
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315 SHFV-krc1 and SHFV-krc2 display many biological properties associated 

316 with the potential for rapid evolution - a feature that is shared by many emergent 

317 RNA viruses. For example, high diversity at the population level (inter-host 

318 diversity) can facilitate speciation, and related yet distinct viruses can recombine 

319 [31 ,33]. High within-host diversity also enables a virus to escape the host 

320 immune response, alter tropism, and infect new host species [34,35]. In these 

321 contexts, high viral load increases the probability of transmission by "widening" 

322 the population bottleneck that often reduces the fitness of an RNA virus upon 

323 transmission [36-38]. Such features enhance the ability of a virus to adapt to 

324 changing environments and have been implicated in the ability of some viruses to 

325 transmit across species barriers [2]. Although the arteriviruses in general are 

326 considered to be highly specific for their hosts, we note that SHFV-LVR and 

327 related viruses have been transmitted between primate species from 

328 presumptive African primate hosts into Asian macaques on several occasions 

329 [1 1-14,30]. Recent work suggests that the capacity for SHFVs to infect multiple 

330 primate species is not unique to SHFV-LVR, as experimental infection of 

331 macaques with SHFV-krc1 resulted in viral replication and clinical disease 

332 (unpublished data). The biological properties of SHFV-krc1 and SHFV-krc2 in a 

333 natural host that we have identified herein may help explain the propensity of the 

334 SHFVs to infect primates of species other than their natural host. Future 

335 investigation of these viruses should provide further insight into the full extent of 

336 their cross-species transmission potential. 
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337 Our analysis shows that SHFV-krc1 and SHFV-krc2 are not merely highly 

338 divergent forms of the same virus, but in fact possess unique and distinct 

339 biological properties. Nucleotide diversity was consistently higher in SHFV-krc1 

340 than in SHFV-krc2. This is likely a result of the higher viral loads observed for 

341 SHFV-krc1, reflecting more extensive viral replication and a correspondingly 

342 higher rate of accumulation of within-host mutations [39]. This hypothesis is 

343 supported by positive correlations between viral load and both synonymous and 

344 non-synonymous nucleotide diversity (data not shown). Interestingly, viral load 

345 and nucleotide diversity for both SHFV-krc1 and SHFV-krc2 were not significantly 

346 impacted by the presence of the other virus (Figure 3). When viewed in light of 

347 the competitive exclusion principle [40] this finding suggests that the two viruses 

348 may occupy discrete niches within the red colobus host (e.g. tissue tropisms), 

349 possibly resulting in distinct aspects of infection that could contribute to the 

350 observed differences in infection frequency (Figure 2) and viral burden (Figure 3). 

351 The most significant difference in nucleotide diversity that we observed 

352 between SHFV-krc1 and SHFV-krc2 was found in ORF3 (Figures 6 and 7), which 

353 codes for the putative envelope glycoprotein GP3. GP3 of SHFV-krc1 and SHFV- 

354 krc2 appears similar in topology to GP3 of other arteriviruses, with predicted N- 

355 and C-terminal membrane-spanning domains separated by a heavily 

356 glycosylated ectodomain. While the precise function of GP3 in the arterivirus life- 

357 cycle remains elusive, GP3 is thought to be an important determinant of tissue 

358 tropism [41,42]. GP3 is also immunogenic [43,44] and glycans attached to the 

359 GP3 ectodomain may play a role in evasion of the humoral immune response 
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360 through the shielding neutralizing antibody epitopes [45]. It is possible that GP3 

361 has multiple functions, as GP3 of PRRSV and LDV have been found in both 

362 virion-associated and soluble secreted forms [46-50]. Our analysis revealed a 

363 distinct region of non-synonymous diversity suggestive of positive selection in 

364 ORF3 of SHFV-krc2 (codons 141-173) (Figure 7D). This region contained an 

365 unusually high density of acidic residues and multiple, variable putative N- 

366 glycosylation sites. Although a similar region was not found in ORF3 of SHFV- 

367 krd , a unique peak of non-synonymous diversity was identified between codons 

368 50-68 of ORF3 in SHFV-krc1 that was also suggestive of positive selection. 

369 Finally, another difference between SHFV-krc1 and SHFV-krc2 was that no 

370 signal sequence cleavage site could be identified in GP3 of any SHFV-krc1 

371 variant, while a clear signal sequence cleavage site was found C-terminal to the 

372 first predicted transmembrane domain in GP3 of SHFV-krc2 [51]. The most likely 

373 explanation of this finding is that the signal sequence cleavage site of GP3 in 

374 SHFV-krc2 is not utilized, as has been shown for GP3 of EAV [43,49]. Although 

375 the functional significance of these differences between GP3 of SHFV-krc1 and 

376 GP3 of SHFV-krc2 is presently unclear, the potential effect of GP3 on the 

377 immune response and its putative role as a determinant of host cell tropism may 

378 help to explain why SHFV-krc2 mono-infections are twice as common as SHFV- 

379 krd mono-infections (Figure 2), despite the significantly lower viral loads of 

380 SHFV-krc2 relative to SHFV-krc1 (Figure 3A). 

381 Despite the differences we observed between SHFV-krc1 and SHFV-krc2 

382 in ORF3, we found nearly identical patterns of non-synonymous and 
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383 synonymous nucleotide diversity in 0RF5, which - by analogy to other 

384 arteriviruses - codes for the major envelope glycoprotein GP5 [17,52]. Two 

385 distinct peaks of non-synonymous diversity were found in the 5'-proximal region 

386 of 0RF5, which corresponds to the protein's predicted ectodomain (Figure 7). 

387 This region of GP5 contains the primary neutralizing antibody epitope of PRRSV, 

388 EAV, and LDV [53-56], as well as an immunodominant "decoy" epitope in 

389 PRRSV that may serve to subvert neutralizing antibody responses [57]. These 

390 epitopes align closely with more 3'-proximal peak of non-synonymous diversity 

391 we identified in SHFV-krc1 and SHFV-krc2 (data not shown), suggesting that 

392 antibody pressure in the red colobus may select for escape mutations in SHFV- 

393 krd and SHFV-krc2, resulting in the observed genetic diversity of this region. 

394 Glycans in this region of the GP5 ectodomain - in addition to aiding viral 

395 attachment through the binding of host molecules (e.g. sialoadhesin for PRRSV) 

396 [58] - are also implicated in evasion of humoral immune responses by 

397 arteriviruses. Pigs infected with PRRSV variants containing partially de- 

398 glycosylated GP5 mount significantly more robust neutralizing antibody 

399 responses [45,59]. A similar observation was made for LDV in mice, and the 

400 abolishment of A/-glycosylation sites in GP5 had the additional effect of altering 

401 the tissue tropism of these "neurotropic" LDV strains [60,61]. Putative N- 

402 glycosylation sites were variably found in association with each peak of non- 
403 synonymous nucleotide diversity identified ORF5/GP5 of both SHFV-krc1 and 

404 SHFV-krc2 (Figure 7). However, in contrast to the GP5 ectodomains of PRRSV, 

405 EAV, and LDV, a highly conserved hydrophobic stretch of approximately thirty 
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406 amino acids separated these two regions of diversity, and was predicted to form 

407 an additional transmembrane domain in both SHFV-krc1 and SHFV-krc2 [62-64]. 

408 A domain that spans the membrane once in this region would place the N- 

409 terminal portion of GP5 - including the region corresponding to the more 5'- 

410 proximal peak of non-synonymous nucleotide diversity - within the virion. While 

411 this possibility cannot be formally excluded, the high sequence diversity of this 

412 region - including multiple putative A/-glycosylation sites - suggests that this 

413 scenario is unlikely. Nevertheless, it is conceivable that this region interacts 

414 extensively with the membrane of the virion and its functional significance, 

415 although obscure, is highlighted by its conservation across all other known 

416 simian arteriviruses including SHFV-LVR, SHFV-krtg1 , and SHFV-krtg2 (data not 

417 shown). 

418 The findings presented in this study show that SHFV variants contain high 

419 genetic diversity within their hosts. This presents the possibility that SHFV-krc1 

420 or SHFV-krc2 could evolve rapidly within the red colobus, perhaps gaining 

421 virulence, similar to the recent emergence of highly pathogenic PRRSV in pigs in 

422 China and Southeast Asia [65,66]. As the red colobus population of Kibale faces 

423 the stressors of deforestation and a changing climate, monitoring these infections 

424 may be important to the conservation of this already endangered wild primate 

425 [67]. 
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652 FIGURE LEGENDS 

653 

654 Figure 1. Schematic of the SHFV genome. (A) ORFs as they are referred to in 

655 Lauck et al., 201 1 [9], labeled sequentially 5'-3': ORF1 a-ORF9. Asterisks denote 

656 ORFs identified in SHFV-krc1 and SHFV-krc2 not reported in Lauck et al., 201 1 

657 [9]. (B) ORFs as they are named in Snijder et al., 2013 [17], labeled 5'-3': 

658 ORF1a-ORF7, with duplicated ORFs designated by a "prime" (e.g. ORF2a'). 

659 Expression products are given in bold. 
660 

661 Figure 2. Infection of the Kibale red colobus with SHFV-krc1 and SHFV- 

662 krc2. SHFV-krc1 (green) and SHFV-krc2 (purple) infections were identified by 

663 "unbiased" deep sequencing and confirmed by strain-specific qRT-PCR. 

664 

665 Figure 3. Viral loads of SHFV-krc1 and SHFV-krc2 in the Kibale red colobus. 

666 Comparison of SHFV-krc1 (green) and SHFV-krc2 (purple) viral loads from all 

667 animals positive for either virus (A) and viral loads from mono-infections vs. co- 

668 infections of SHFV-krc1 (B) and SHFV-krc2 (C). RNA was isolated from blood 

669 plasma and quantitative RT-PCR was performed using strain-specific primers 

670 and probes designed from deep sequencing data. Statistical significance was 

671 assessed using a two-tailed t-test performed on log-transformed values (CI = 

672 95%). 
673 
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674 Figure 4. Pairwise comparison of nucleotide identity among variants of 

675 SHFV-krc1 and SHFV-krc2 from Kibale red colobus (RC). Full coding 

676 sequences for each isolate were aligned using CLC Genomics Workbench. 

677 Numbers show percent nucleotide identity between two variants within (A) SHFV- 

678 krd or (B) SHFV-krc2. Colors highlight similarity, with red representing the most 

679 similar sequences and yellow representing sequences with the lowest degree of 

680 nucleotide identity. The same color scale was used for (A) and (B). 
681 

682 Figure 5. Overall nucleotide diversity of SHFV-krc1 and SHFV-krc2. Mean (± 

683 S.E.) TT S (A), tt n (B), and ttn/tt s (C) in monkeys infected with SHFV-krc1 (green) 

684 and SHFV-krc2 (purple). Paired t-tests were performed to compare mean values 

685 between SHFV-krc1 and SHFV-krc2. 

686 

687 Figure 6. Nucleotide diversity SHFV-krc1 and SHFV-krc2 ORFs. Interaction 

688 graphs comparing mean tt s (A) and tt n (B) in ORFs from SHFV-krc1 (green) and 

689 SHFV-krc2 (purple). In the case of ttN there was a significant ORF-by-virus 

690 interaction (F1 3, 459 = 4.39; p < 0.001 ). Comparison of mean tts (blue) to ttN 

691 (red) within ORFs of SHFV-krc1 (C) and SHFV-krc2 (D) revealed substantial 

692 differences among ORFs within each virus. 
693 

694 Figure 7. Nucleotide diversity across ORF5 and ORF7 of SHFV-krc1 and 

695 SHFV-krc2. Mean tts (blue) and ttn (red) in sliding windows of 9 codons across 

696 the coding region of ORF5 (A,B) and ORF3 (C,D). Overlapping ORFs are shown 
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697 at the bottom. Grey boxes represent predicted transmembrane domains, with 

698 striped grey boxes representing a hydrophobic region unique to the SHFVs. 

699 Green lines depict putative sites of A/-glycosylation, with dashed green lines 

700 showing sites that are variably glycosylated. Yellow boxes show predicted signal 

701 peptide cleavage sites that vary in location in GP5 of SHFV-krc1 and SHFV-krc2 

702 and were not found in GP3 of SHFV-krc1 . The purple box corresponds to the 

703 unique region of highly variable acidic residues found only in ORF3 of SHFV- 

704 krc2. 
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